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ABSTRACT 


A  great  deal  of  study  has  been  conducted  in  the  last  ten 
years  concerning  the  use  of  voice  recognition  equipment  with 
computers.   It  was  hoped  that  its  use  would  reduce  the 
required  entry  time  and  error  rate,  and  improve  the  man- 
machine  interface  between  the  user  and  the  computer. 

There  are  many  potential  applications  for  such  voice 

recognition  use  in  the  military,  and  specifically  in  the  area 

3 
of  Command,  Control  and  Communications  (C  ) .   War  games  are 

3 
often  used  today  to  test  the  effectiveness  of  C   technologies, 

and  WES  is  one  such  war  game. 

This  paper  will  assess  the  feasibility  of  using  voice 

recogniti  en  equipment  to  run  WES  by  comparing  the  results  of 

an  experiment  employing  both  voice  and  manual  typing  input 

modes.   The  results  show  that  in  this  particular  task  typing 

does  a  somewhat  better  job  than  the  buffered  voice  mode,  while 

unbuffered  voice  has  very  poor  results. 
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I.   INTRODUCTION 

The  cost  of  computer  hardware  has  dropped  dramatically 
in  recent  years,  and  the  use  of  computers  throughout  our 
society  has  skyrocketed  to  help  us  manage  the  glut  of  data 
which  we  are  often  presented  and  to  solve  the  increasingly 
more  complex  problems  of  the  present  and  future.   Histor- 
ically, data  has  been  entered  into  the  computer  by  keypunch, 
which  can  be  slow,  monotonous  and  error-filled  for  all  but 
the  very  well-trained.   Researchers  have  looked  for  a  better, 
more  efficient  man-machine  interface  than  the  keypunch,  and 
as  early  as  the  1950* s  they  realized  that  the  most  natural 
type  of  communication  which  we  as  humans  use  is  speech.   So 
why  not  simply  speak  to  a  computer  as  you  would  to  a  fellow 
worker  and  have  the  computer  perform  whatever  task  you  have 
directed? 

A.   A  BRIEF  HISTORY  OF  VOICE  TECHNOLOGY 
1.   General  Background 

Voice  recognition  systems  have  received  quite  a  bit 
of  interest  since  the  1950' s,  mainly  during  the  past  fifteen 
years.   Automatic  speech  recognition,  per  se,  is  concerned 
with  automatically  determining  linguistic  messages  spoken  to 
the  voice  recognizer  by  comparing  them  to  acoustic  data  stored 
in  the  recognition  system.   Both  industry  and  the  military 
have  decided  to  study  the  feasibility  of  incorporating 


interactive  voice  recognition  systems  in  their  computer 
operations  in  order  to  have  a  more  natural  interface  with  the 
computer,  to  increase  the  speed  of  data  entry  and  retrieval 
and  thereby  increase  throughput,  and  to  lower  the  input  error 
rate.   A  voice  recognition  system,  using  one's  natural  lan- 
guage, would  certainly  seem  to  have  the  potential  for  reduc- 
ing errors  at  the  man-machine  interface.   In  addition,  the 
higher-level  personnel  in  industry  and  the  military,  those 
specifically  who  must  make  the  important  decisions  and  who 
most  need  the  decision-making  aid  of  the  computer,  are  those 
least  likely  to  sit  at  the  keyboard  and  use  the  computer. 
So  it  was  thought  that  voice  interaction  would  help  these 
high-level  personnel  become  more  direct  users  of  the  systems 
on  which  they  depend. 

Interactive  voice  recognition  systems  (i.e.,  those 
which  give  either  a  vocal  or  a  displayed  response  to  a  verbal 
input)  can  be  basically  divided  into  two  categories: 
isolated  word  recognizers  and  continuous  speech  recognizers. 
Isolated  word  recognition  systems  were  the  first  type 
developed  and  by  far  the  easier  of  the  two  to  engineer  and 
construct.   An  isolated  word  recognizer,  with  a  limited  vocab- 
ulary of  x  number  of  words  or  utterances  (short  phrases) , 
must  simply  recognize  the  utterance  spoken  to  it  and  respond 
as  programmed.   This  recognition  is  accomplished  by  "training" 
the  system  prior  to  its  use.   Anyone  who  will  be  using  the 
system  trains  it  by  repeating  the  various  vocabulary  words  a 


number  of  times,  usually  between  five  and  ten,  with  different 
inflection,  stress,  pronunciation,  etc.,  while  in  a  "training 
mode."   The  parameters  of  the  pronunciation  of  each  utterance 
are  then  averaged  and  stored  in  the  digital  speech  processor 
memory  of  the  system.   Then  when  a  word  or  phrase  is  spoken 
to  the  recognition  system,  its  parameters  are  compared 
digitally  to  all  those  stored  in  memory  and  hopefully  a  match 
is  found  and  the  proper  response  made  by  the  computer  [1] . 

While  this  indeed  sounds  like  a  complex  process,  con- 
sider what  the  continuous  speech  recognition  system  must  do. 
In  addition  to  all  the  above  it  must  be  able  to  recognize 
word  sequences  and  digit  strings.   It  must  be  able  to  find 
boundaries  between  words,  or  segment  the  utterance  either 
explicitly  or  implicitly  by  trying  to  fit  together  sequences 
of  word  pronunciations  before  the  final  classification 
process  [2,3].   It  is  difficult  to  analyze  the  beginnings  and 
endings  of  words  unless  adjacent  words  are  known;  it  is  much 
easier  to  recognize  words  spoken  in  isolation,  or  separated 
by  short  pauses,  than  those  with  no  pauses  between  them. 
However,  it  is  very  unnatural  for  humans  to  pause  after  speak- 
ing each  word  in  a  sentence, and  although  the  first  isolated 
word  recognition  systems  have  been  in  use  since  1972,  further 
study  into  advanced  systems  has  continued. 

2.   Some  Past  Uses  of  Voice  Recognition  Systems 

Beginning  in  1972,  there  have  been  several  successful 
uses  of  interactive  voice  recognition  systems  in  industrial 


settings.   These  have  been  strictly  isolated  word  systems  up 
to  this  time.   It  has  been  found  that  by  using  voice  systems 
to  interact  with  a  computer,  a  worker's  hands  and  eyes  are 
both  free  to  continue  their  tasks.   It  is  thereby  possible 
to  increase  the  speed  of  data  entry  by  the  worker  not  having 
to  stop  what  he  is  doing,  write  down  or  directly  enter  data 
and  then  return  to  where  he  previously  left  off.   Voice  also 
cuts  down  on  the  number  of  errors  often  encountered  in  this 
process  or  in  other  processes  where  the  first  worker  must 
relay  information  to  a  second  worker  who  then  enters  what  he 
heard  (perhaps  incorrectly)  into  the  data  system. 

Airlines  were  the  first  to  use  voice  recognition  to 
input  data  to  a  computer  for  the  correct  routing  of  baggage 
to  various  aircraft.   It  was  found  very  efficient  to  allow 
the  baggage  handler  to  input  data  by  voice,  freeing  his  hands 
and  eyes  to  look  at  and  handle  the  pieces  of  luggage.   Banks 
have  been  able  to  accomplish  paperless  transfers  of  funds, 
dividends,  retirement  payments  and  the  payments  of  bills  by 
simply  speaking  the  dollar  amount  to  be  transferred  to  the 
voice  recognition  system.   Quality  assurance  checks  on 
manufactured  goods  have  been  greatly  simplified  and  speeded 
up  in  many  cases  by  allowing  inspectors  to  use  their  hands 
and  eyes  for  the  inspections  while  simultaneously  inputting 
data  to  a  computer  by  voice.   In  addition  to  these  few 
examples  of  discrete  speech  recognition  there  are  many  other 
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areas  where  voice  recognition  systems  either  are  currently 
being  used,  or  could  easily  be  used  in  the  future  [1,4]. 

B.   STUDY  AND  TESTING  OF  VOICE  RECOGNITION  SYSTEMS 

Although  discrete  speech  recognition  systems  have  been, 
and  are  in  use,  research  has  continued  on  both  the  discrete 
and  continuous  speech  systems.   Probably  the  largest  such 
study  undertaken  to  date  has  been  the  Advanced  Research 
Projects  Agency  (ARPA)  five  year  $15  million  Speech  Under- 
standing Research  (SUR)  project  begun  in  1971.   This  project 
was  designed  to  provide  a  breakthrough  in  the  handling  of 
spoken  sentences,  by  the  use  of  higher-level  linguistic 
information  and  specific  task-dictated  constraints  on  what 
could  be  said  [5] .   It  was  thought  that  this  was  an  important 
project  because  of  increased  industrial  interest  in  speech 
recognition,  government  interest  in  future  programs,  the  work 
of  several  foreign  countries  in  the  field,  and  projected 
future  widespread  applications.   In  1978  the  projected  ten 
year  sales  of  2.5  million  speech  processing  units  ($4.8 
billion)  seemed  to  lend  a  qreat  deal  of  credence  to  these 
points  [5] . 

The  SUR  project  was  concerned  with  understanding  as 
opposed  to  simply  word  recognition.   By  understanding  was 
meant  having  the  system  interpret  an  utterance  and  respond 
correctly.   The  project  was  designed  to  be  highly  task 
oriented,  and  to  have  speech  analyzed  and  interpreted  in 
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the  context  of  a  task,  rather  than  interpreting  each  word  or 
component  of  the  utterance  individually  [3,6].   Other  goals 
included  a  working  vocabulary  of  1,000  words  for  the  system 
and  accuracy  of  90  percent  averaged  over  several  different 
speakers . 

The  SUR  project  was  designed  to  develop  several  inter- 
mediate "throw-away"  systems  rather  than  to  work  toward  one 
carefully  designed  ultimate  system.   With  this  in  mind  there 
were  five  main  system  contractors  and  four  specialist  contrac- 
tors engaged  in  the  research  at  the  start  of  the  SUR  project. 
The  five  main  contractors  were  Bolt,  Beranek  and  Newman  (BBN) ; 
Carnegie-Mellon  University;  Lincoln  Laboratory;  System  Develop- 
ment Corporation;  and  SRI  International.   The  four  specialist 
contractors  were  Haskins  Laboratory;  Speech  Communications 
Research  Laboratory;  Univac;  and  the  University  of  California 
at  Berkeley  [7] . 

Approximately  one-half  way  through  the  five  year  project 
three  systems  which  seemed  to  be  farthest  along  in  meeting 
ARPA's  goals  were  selected  to  continue  the  project.   When  SUR 
ended  in  September  1976  it  was  generally  agreed  that  it  had 
greatly  advanced  the  state  of  the  art  in  continuous  voice 
recognition  and  that  cost-effective  speech  input  was  a 
plausible  scientific  and  technical  goal  [6] .   One  of  the 
final  three  systems  called  HARPY,  developed  by  Carnegie- 
Mellon  University,  met  all  of  ARPA's  initial  goals.   Using 
a  vocabulary  of  1,011  words  and  five  different  speakers, 
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HARPY  achieved  a  total  sentence  accuracy  (i.e.,  all  words 
correct)  and  semantic  accuracy  (i.e.,  correct  response  ac- 
curacy) of  over  90  percent  for  the  specific  task,  of  document 
retrieval  [5/6].   The  other  two  systems  tested,  HWIM  (for 
Hear  What  I  Mean,  by  BBN)  and  HEARSAY  II  (by  Carnegie-Mellon) 
fell  somewhat  short  of  the  stated  objectives. 

Another  more  recent  study  of  voice  technology  was  done 
for  the  Rome  Air  Development  Center,  Rome,  New  York.   In  this 
project  the  use  of  voice  systems  to  input  cartographic  data 
for  the  Defense  Mapping  Agency  Aerospace  Center  was  studied. 
It  was  found  that  voice  input  was  fast,  more  accurate  and 
easier  to  use  than  the  paper,  pencil  and  keypunch  that  were 
presently  in  use.   In  addition,  the  voice  system  eliminated 
the  need  for  skilled  typists  to  interact  with  the  computer. 
It  was  found  that  the  speed  of  data  entry  for  inexperienced 
personnel  was  much  higher  for  voice  than  for  those  at  a  key- 
board who  were  not  skilled  typists,  indicating  that  much  less 
training  was  required  to  operate  the  voice  recognition  system 
than  to  skillfully  use  the  keyboard  [8] .   For  this  particular 
task,  and  for  others  as  well,  since  voice  is  the  most  natural 
mode  of  communication,  it  was  hoped  that  its  performance  level 
would  be  higher  than  manual  input  with  a  minimum  of  training. 

A  final  example  of  a  recent  voice  recognition  study  [9] , 
carried  out  at  the  Naval  Postgraduate  School  (NPS) ,  compared 
the  uses  of  manual  and  voice  inputs  to  run  a  distributed 
computer  network.   Using  twenty-four  military  officers  as 
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subjects  operating  on  the  ARPA  Network,  and  using  a  fixed 
scenario  of  instructions,  it  was  found  that  voice  input  - 
again  with  minimal  voice  practice  -  was  17.5  percent  faster 
than  manual  typing  input,  and  manual  input  had  18  3.2  percent 
more  entry  errors  than  did  the  voice  input.   It  is  presumed 
that  an  even  greater  difference  would  have  been  recorded  had 
experienced  voice  input  subjects  been  used. 

C.   POSSIBLE  MILITARY  USES  OF  VOICE  RECOGNITION  SYSTEMS 

The  military  is  also  carefully  studying  the  use  of  voice 
interactive  systems  for  many  varied  applications.   The  author 
has  encountered  several  possible  Navy  applications  which  are 
prime  candidates  for  voice  recognition  use.   In  the  area  of 
tactical  data  systems,  normally  data  has  been  directly  entered 
from  remote  sensors  or  by  an  operator  at  a  keyboard,  and  then 
either  acted  upon  or  retrieved  by  the  operator.   Voice  systems 
can  greatly  facilitate  the  operator's  data  entry  or  retrieval 
by  allowing  him  to  interact  vocally  with  the  system  rather 
than  requiring  a  skilled  typist  at  the  keyboard.   This  should 
reduce  the  time  needed  for  interaction  and  the  possibility  of 
many  errors  [6] . 

A  study  at  NPS  addressed  the  possibility  of  using  a  voice 
recognition  system  as  the  interface  between  a  ship's  Tactical 
Action  Officer  (TAO)  and  the  Naval  Tactical  Data  System  (NTDS) 
computer  in  order  to  reduce  reaction  time.   This  study  also 
postulated  the  use  of  a  voice  synthesizer  to  output  the 
information  requested  from  the  computer.   The  authors  felt 
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that  there  is  an  incompatibility  between  a  discrete  speech 
system  and  other  communications  which  the  TAO  uses.   During 
a  period  of  tension,  it  might  be  difficult  to  use  discrete 
speech  with  one  system  and  continuous  modulation  on  others. 
It  was  also  felt  that  a  discrete  speech  recognition  system 
would  not  be  compatible  with  the  rapid  pace  of  a  TAO's 
duties  [10] .   Further  study  should  be  done  in  this  area. 

Naval  aviation  is  a  field  where  there  are  a  great  many 
possibilities  for  the  use  of  voice  systems.   One  study  [11] 
reported  investigating  the  feasibility  of  using  a  Voice 
Recognition  and  Synthesis  (VRAS)  system  with  the  Advanced 
Integrated  Display  System  (AIDS)  on  Navy  aircraft.   VRAS,  a 
software  package  of  real-time  voice  processing  routines,  when 
used  with  the  AIDS  cockpit  information  system  would  provide  a 
much  improved  man-machine  interface  between  the  pilot  and  the 
onboard  computer.   The  voice  interactive  system  in  this  case 
could  handle  complex  tasks  encountered  in  an  airborne  environ- 
ment and  could  free  the  eyes  and  hands  of  the  pilot  for  other 
tasks.   Some  possible  uses  would  include  selecting  a  missile 
verbally  vice  manually,  and  having  this  confirmed  verbally, 
thereby  allowing  the  pilot  to  fly  a  better  intercept  profile. 
The  system  could  be  used  for  reporting  (e.g.,  "report  air- 
speed"), data  entry,  systems  checks  where  VRAS  reports  when 
a  checklist  is  complete,  and  so  on.   It  is  thought  that  this 
might  help  reduce  the  clutter  of  instrumentation  and  fault 
warning  displays  in  the  aircraft.   In  addition,  it  was  even 
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postulated  that  a  speech  recognition  system  together  with  an 
adequate  display  system  could  substitute  for  a  second  man  in 
an  F-14/A-6  type  aircraft.   It  could  save  space,  and  reduce 
weight,  fuel  consumption,  manpower,  training  and  the  life 
cycle  costs  of  an  aircraft  [6]  . 

Other  military  areas  where  voice  recognition  systems 
could  be  used  might  include  command  centers,  combat  informa- 
tion centers  on  board  Navy  ships,  inputs  for  weapons  fire 
control  systems,  and  air  traffic  control.   Very  interesting 
and  relevant  research  is  presently  being  done  at  NPS  on  the 
possibility  of  using  voice  systems  for  the  military  photo 
interpreter  and  for  use  with  the  Joint  Chiefs  of  Staff  (JCS) 
Emergency  Action  Message  (EAM)  system.   Appendix  A  lists  voice 
recognition  studies  which  have  been,  or  are  presently  being 
conducted  at  NPS . 

Although  a  good  deal  of  research  has  been  done  on  the 
feasibility  and  design  of  interactive  voice  recognition  sys- 
tems, much  is  yet  to  be  done.   For  instance,  how  do  you  improve 
the  acoustic  phonetic  analysis  ability  of  a  system  so  that  it 
is  able  with  a  high  degree  of  accuracy  to  understand  continuous 
voice  commands  from  a  large  number  of  people?   Is  there  really 
even  a  need  for  continuous  voice  recognition  systems?   They 
would  certainly  be  nice,  and  they  are  much  more  "natural"  than 
isolated  word  systems  for  a  human  user,  but  what  is  the  op- 
portunity cost  of  developing  them?   These  questions  are  now 
being  answered  and  will  be  answered  in  the  future,  thanks  in 
great  part  to  the  impetus  of  the  SUR  project. 
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II.   BACKGROUND 

The  area  of  Command,  Control  and  Communications,  or  C  , 
has  been  an  integral  part  of  human  existence  since  the  begin- 
ning of  civilization,  although  it  has  gone  by  different  titles 
and  has  had  slightly  different  shades  of  meaning.   There  is  a 

great  deal  of  difficulty  even  now  in  defining  and  quantifying 

3 
this  "new"  area  of  C  .   It  is  definitely  a  process,  it  in- 
volves equipment  and  individuals,  and  also  goals  or  missions. 

3 
To  this  author  C   is  a  process  or  means  by  which  a  military 

commander  (or  civilian  authority)  exercises  authority  and 

direction  in  allocating  scarce  resources  (e.g.,  money,  troops, 

ships,  etc.)  in  order  to  achieve  organizational  goals  in  the 

most  efficient  manner  possible. 

A.   VOICE  RECOGNITION  IN  C3 

In  his  action  of  directing  or  allocating  resources,  in 

3 
performing  the  vital  elements  of  C  ,  the  commander  must  inter- 
act with  individuals  and  equipments.   Several  of  the  military 

examples  of  speech  recognition  study  in  Section  I  fall  within 

3 
this  area  of  C  .   These  examples  included  the  TAO-NTDS  inter- 
face, use  of  voice  recognition  in  a  command  center  or  CIC  and 
use  of  voice  recognition  by  a  pilot  in  the  cockpit  of  an 
aircraft.   Each  of  these  certainly  depicts  a  command  and 
control  situation  where  voice  recognition  systems  might  be  of 
use.   Additionally,  the  example  [9]  of  the  increased  speed  of 
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input  and  lower  error  rates  provided  by  voice  input  while 

controlling  a  distributed  computer  network  certainly  points 

3 
to  the  possible  use  of  voice  for  C   purposes. 

Several  features  make  speech  recognition  potentially  very 

3 

useful  in  the  area  of  C  .   It  is  felt  that  there  will  be  a 

closer  coupling  of  the  commander  with  the  system  he  depends  on 
when  using  speech  inputs.   Most  commanders  would  never  tie 
themselves  down  to  a  keyboard  during  any  crisis  or  battle 
situation.   With  the  use  of  speech  recognition  and  a  wireless 
microphone  there  would  not  be  this  feeling  of  being  tied 
down.   There  would  also  be  more  centralization  of  control  in 
a  crisis  situation.   This  would  result  in  increased  speed  of 
interaction  with  the  system,  and  a  more  effective  use  of  the 

new  support  technology  available  [6] . 

3 
In  a  C   environment,  voice  systems  could  certainly  be 

used  for  data  input  and  retrieval.   A  Task  Force  commander 

would  directly  use  such  a  system  for  information  management 

and  evaluation,  as  an  aid  in  decision  making,  and  for  decision 

dissemination.   The  closer  a  commander  can  be  to  the  system 

upon  which  he  bases  his  decisions,  the  better  the  quality  of 

his  decisions  should  be,  with  greater  avoidance  of  serious 

error.   Command  language  also  is  of  limited  complexity  with  a 

rather  large  vocabulary  to  cover  many  possibilities,  and  this 

should  suit  it  well  to  a  voice  recognition  system. 
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B.   WARGAMING/SIMULATIONS  FOR  MEASURING  C3  EFFECTIVENESS 

One  of  the  major  problems  in  the  C   arena  has  been  how  to 

3  3 

measure  the  effectiveness  of  a  C   system.   Since  C  must  func- 
tion in  distinctly  different  conditions  (e.g.,  peacetime, 
periods  of  crisis,  conventional  or  nuclear  war)  this  becomes 
increasingly  more  difficult.   How  does  one  gauge  or  measure 
whether  a  Command  and  Control  system  will  function  in  a 
nuclear  war?   More  importantly,  perhaps,  is  whether  the  system 
will  function  in  those  transition  times  between  each  of  these 
major  conditions. 

It  is  certainly  not  sufficient  to  measure  effectiveness 
by  simply  comparing  the  "output"  of  one  system  with  that  of 
another.   For  example,  for  a  new  communications  system  simply 
having  a  higher  message  handling  rate  or  a  lower  bit  error 

rate  than  an  existing  system  does  not  necessarily  improve  the 

3 

C   capability.   The  effectiveness  of  a  system  in  improving  the 

chances  of  victory  in  battle,  or  for  achieving  organizational 

3 
goals,  makes  it  a  better  C   system.   Since  it  is  often  not 

possible  to  test  a  system  under  such  conditions,  simulations 

and  models  are  often  used. 

War  games  are  a  type  of  simulation  frequently  used  by  the 

3 
military  to  evaluate  C   effectiveness.   Through  the  use  of  a 

war  game  evaluators  and  commanders  can  determine  with  a  great 

deal  of  accuracy  the  effectiveness  of  present  and  proposed 

3 
C   technologies  under  simulated  warfare  conditions.   Such 

war  games  often  allow  for  replication  so  that  a  scenario 
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3 

basically  can  be  replayed  using  different  C   strategies  in 

order  to  evaluate  the  effectiveness  of  one  system  as  opposed 
to  another.   War  games  are  a  very  cost-effective  means  of 
running  such  an  evaluation  under  realistic  conditions  using 
experienced  players. 

1.   Manual  and  Voice  Inputs  for  Games 

The  most  realistic  war  games  today,  those  which  are 
able  to  be  run  at  a  near  real-time  speed,  which  are  able  to 
enter  and  disseminate  a  large  volume  of  sensor  and  fire  con- 
trol data,  and  which  are  able  to  regularly  and  quickly  update 
displays  are  either  computer-assisted  or  computer-run  war 
games.   Manual  war  games,  although  generally  no  less  accurate 
than  computer-assisted  games,  are  usually  very  slow  moving, 
require  many  extra  participants  to  record  data  and  often 
quickly  become  monotonous  and  tedious.   In  a  computer-assisted 
war  game  commands  are  generally  input  at  a  keyboard  as  is 
usually  the  case  for  most  other  computer-type  functions,  as 
previously  noted.   It  is  certainly  plausible  to  consider  using 

voice  input  devices  to  run  such  war  games. 

3 
If  war  games  are  to  be  used  to  evaluate  C   effective- 
ness, one  facet  of  such  an  evaluation  certainly  could  be  any 
increase  in  effectiveness  provided  by  a  voice  recognition 
system  as  opposed  to  conventional  manual  input.   In  fact  a 
war  game  can  be  used  as  a  vehicle  for  testing  the  concept  of 
using  voice  recognition  equipment  in  any  number  of  other 
military  applications  where  high  speed  of  input  and  low  error 
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rates  are  necessary.   It  was  with  this  thought  in  mind  that 
the  author  decided  to  develop  and  conduct  an  experiment  com- 
paring the  use  of  voice  and  manual  inputs  to  run  a  Naval  war 
game.   The  author  chose  the  CINCPACFLT  version  of  the  Warfare 
Environmental  Simulator  (WES)  as  the  war  game  to  use  in  this 
experiment.   WES  was  chosen  mainly  because  it  is  easily  acces- 
sible from  the  NPS  Remote  Site  Module  (RSM)  and  because  the 
author  was  already  somewhat  familiar  with  its  operation. 

C.   DESCRIPTION  OF  THE  WARFARE  ENVIRONMENTAL  SIMULATOR  [12] 

The  Warfare  Environmental  Simulator  (WES)  is  a  computer- 
assisted  war  game  which  runs  on  a  DEC  KL-2040  or  a  PDP-10 
computer  at  the  Naval  Ocean  Systems  Center  (NOSC) ,  San  Diego, 
California.   WES  is  a  two-sided  interactive  game  in  which 
Blue  and  Orange  sides  can  define,  structure  and  control  their 
own  forces.   The  game  is  strictly  a  Naval  war  game  which 
employs  approximately  80  player  commands  to  control  the  plat- 
forms and  sensors  engaged  in  the  game. 

Each  command  position  in  a  WES  game  contains  a  graphics 
terminal  situation  display,  an  alphanumeric  terminal  present- 
ing status  board  displays  and  another  alphanumeric  terminal 
for  input  of  player  commands.   This  player  terminal  acts  as 
both  an  input  and  an  output  terminal.   While  the  system  is  in 
the  input  mode  output  messages  are  queued.   The  color  graphics 
display  is  driven  by  a  PDP-11/70  which  is  interfaced  with 
NOSC's  KL-204  0  or  PDP-10  via  the  ARPANET.   WES  operates  under 
either  the  TOPS-20  or  TENEX  systems. 
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The  WES  game  is  a  combination  of  three  major  processes, 
called  BUILD,  FORCE  and  WARGAM.   Each  of  these  is  an  integral 
part  of  the  war  game  and  must  be  initialized  and  used  prior 
to  and  during  game  play.   The  BUILD  process  is  used  to  create 
and  modify  a  database  of  game  objects  such  as  ships  or  shore 
bases.   With  BUILD  a  player  may  add,  delete  or  modify  a  file 
of  game  objects  in  the  database.   This  will  normally  be  done 
prior  to  game  play  when  determining  the  forces  needed  for  the 
game.   The  database  contains  values  for  ship  classes,  shore 
bases,  aircraft  types,  missiles,  sensors  and  weapons. 

The  FORCE  process  creates  the  actual  game  scenario  to  be 
used.   With  FORCE  game  objects  from  BUILD  files  are  organized 
into  task  hierarchies  for  use  in  the  game.   FORCE  specifies 
the  actual  names  and  classes  of  ships,  their  initial  locations, 
courses  and  speeds  along  with  any  associated  aircraft,  sensors 
and  weapons.   FORCE  allows  a  player  to  create  new  game  scena- 
rios, to  modify  a  scenario,  to  change  numeric  parameters  or 
to  input  or  delete  items  from  a  scenario.   Contingency  plans 
which  might  be  used  during  a  game  can  also  be  created  and 
entered  into  the  specific  game  database  by  using  FORCE. 

WARGAM  actually  runs  the  interactive  game  based  on  the 
chosen  scenario  and  the  commands  input  by  the  players .   Once 
initiated  it  responds  to  player  commands,  generates  both  the 
graphics  and  the  status  board  displays  and  updates  these 
displays  each  game  minute.   The  WES  graphics  display  at  NPS 
uses  a  GENISCO  display  processor/CONRAC  CRT  to  display  in 
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color  the  graphic  situation  display.   This  display  includes 
grid  tick  marks,  background  maps,  NTDS  symbology  for  friendly, 
neutral  and  enemy  tracks,  lines  of  bearing  for  passive  sensors, 
weapons  envelopes  and  game  time. 

Six  alphanumeric  status  board  displays  are  controlled  by 
WARGAM  and  are  shown  on  a  user  terminal  one  at  a  time .   The 
player  controls  the  status  board  functions  by  depressing 
appropriate  keys  at  the  terminal.   The  six  status  board  dis- 
plays include  the  following:   active  track  status,  passive 
track  (ESM)  status,  friendly  ship  status,  friendly  air  status, 
friendly  shore  bases  status  and  flight  status.   These  displays 
then  contain  all  the  status  information  which  one  would 
expect  to  find  in  the  CIC  on  a  surface  ship. 

The  WES  commands  which  players  use  to  control  the  war  game 
are  highly  formatted  in  terms  of  syntax  and  input  parameters. 
Two  types  of  errors  are  possible  when  inputting  a  command. 
First,  the  syntax  may  be  incorrect.   In  this  case  an  immediate 
warning  is  issued  on  the  terminal  saying  that  the  command  can- 
not be  parsed.   This  should  alert  the  player  to  check  his 
command  and  then  reenter  it  correctly.   Second,  a  command 
might  order  some  impossible  action  (e.g.,  addressing  a  ship 
not  in  the  game) .   No  immediate  warning  is  issued  in  this  case 
since  the  order  parses  correctly.   However,  when  execution  of 
the  order  is  attempted  it  cannot  be  carried  out  and  this  fact 
is  displayed  on  the  terminal  for  the  player.   When  an  order 
is  entered  correctly,  the  system  responds  that  the  order  has 
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been  entered;  this  indicates  that  the  order  was  parsed  and 
sent  on  for  execution,  but  not  that  there  is  no  possible 
discrepancy  in  the  order  (as  noted  in  the  second  error  case 
above) . 

It  was  with  this  game  of  WES  as  described  above  that  the 
author  conducted  his  voice/manual  input  experiment.   The 
details  and  background  of  the  experiment  are  described  in 
Section  III  and  its  results  in  Section  IV  following.   The 
conclusions  drawn  from  the  data  collected  address  the  feasibil- 
ity of  using  an  automatic  voice  recognition  system  to  run 
computer-assisted  war  games  in  general,  and  WES  in  particular. 
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III.   EXPERIMENTAL  DESIGN 

A.   CONCEPT  OF  THE  EXPERIMENT 

The  basic  goal  of  this  experiment  was  to  test  the  feasibil- 
ity of  operating  WES  by  using  voice  inputs  rather  than  the 
customary  manual  inputs.   This  would  be  accomplished  by  having 
a  number  of  test  subjects  individually  enter  valid  WES  com- 
mands for  BLUE  forces  while  the  game  was  running,  recording 
the  time  necessary  to  successfully  enter  the  commands  and  the 
number  of  errors  committed  with  voice  and  manual  input,  and 
then  analyzing  the  data  to  see  whether  one  entry  method  was 
superior  to  the  other.   Although  the  WES  game  would  be  run- 
ning, the  only  commands  entered  would  be  for  the  BLUE  forces 
and  therefore  there  would  be  no  interaction  between  BLUE  and 
ORANGE,  or  actual  "game  play."   BLUE-ORANGE  interaction  was 
not  considered  necessary  for  the  goals  of  this  experiment. 
However,  it  was  considered  important  to  have  WES  running  dur- 
ing the  experiment,  rather  than  having  the  subjects  merely 
type  out  the  WES  commands  or  speak  them  to  a  voice  recognizer, 
so  that  the  actual  interaction  with  the  WES  input/output 
player  terminal  would  be  accomplished  as  in  a  two-sided  war 
game. 

In  order  to  run  WES,  as  noted  in  Section  II,  game  forces 
must  be  assigned  and  a  scenario  established.   The  author  chose 
to  use  an  existing  WES  scenario  with  its  associated  forces 
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for  the  experiment.   The  CUBA  scenario  was  chosen  due  to  its 
relative  simplicity  and  yet  entirely  adequate  forces  for  the 
experimental  goals.   In  this  scenario  three  United  States  war- 
ships, aircraft  carrier  ENTERPRISE,  guided  missile  destroyer 
BERKELEY  and  nuclear  submarine  STURGEON  are  opposed  by  three 
Soviet  warships  and  one  merchant  ship  in  a  setting  similar  to 
the  1962  Cuban  missile  crisis.   The  test  subjects  would  com- 
mand the  ships  and  forces  of  the  BLUE  task  force  by  using  a 
fixed  series  of  commands  provided  to  them. 

It  was  necessary  to  establish  a  basic  vocabulary  which 
the  subjects  would  use  to  enter  the  player  commands  to  WES. 
This  vocabulary  had  to  be  complete  enough  to  allow  formula- 
tion of  any  of  the  WES  commands  [12]  which  might  be  necessary 
during  play  of  a  game.   The  vocabulary  had  to  contain  all  the 
scenario  specific  words  (e.g.,  ENTERPRISE,  BERKELEY)  which 
might  become  necessary  in  order  to  command  those  BLUE  forces 
in  the  CUBA  WES  scenario.   Also,  the  vocabulary  had  to  be 
compatible  with  both  the  voice  and  keyboard  methods  of  entry. 
The  vocabulary  which  was  used  is  considered  sufficient  to  run 
any  basic  WES  game  involving  the  forces  in  the  scenario.   The 
total  vocabulary  amounted  to  162  words  or  short  phrases 
(Appendix  B) . 

B.   EQUIPMENT  USED 

1.   Hardware  Description  [13] 

For  the  experiment  a  Threshold  Model  T600  discrete 
utterance  voice  recognition  unit  manufactured  by  Threshold 
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Technology,  Inc.  was  used.   The  T600  is  an  electronic  speech 
recognition  device  which  automatically  recognizes  utterances 
of  up  to  two  seconds  in  duration.   These  utterances  can  be  of 
several  words  in  length  as  long  as  they  do  not  exceed  this 
time  duration.   Since  it  is  a  discrete,  or  isolated  speech 
recognition  unit  there  must  be  a  short  pause  (at  least  .1 
second)  between  utterances.   The  T600  allows  up  to  256 
separate  voice  utterances  to  be  stored  in  memory.   As  noted 
above,  162  utterances  were  the  total  vocabulary  for  this 
experiment. 

The  Model  T600  terminal  used  in  this  experiment  con- 
sists of  an  analog  speech  preprocessor,  microcomputer, 
CRT/keyboard  unit,  magnetic  tape  cartridge  unit,  remote  voice 
input  unit  and  noise-cancelling  microphone.   The  speech  pre- 
processor and  microcomputer  are  contained  in  a  main  terminal 
processor  unit.   The  speech  preprocessor  accepts  spoken  input 
from  the  remote  voice  input  unit,  extracts  speech  parameters 
and  converts  these  to  digital  signals  which  are  then  processed 
by  the  microcomputer.   The  microcomputer  compares  these  input 
signals  with  stored  reference  patterns  to  determine  which 
vocabulary  words  were  spoken.   The  reference  patterns  for  all 
the  vocabulary  are  established  during  a  training  phase  when 
the  user  trains  the  voice  recognizer  by  repeating  each  of  the 
vocabulary  utterances  ten  times.   If  a  close  match  is  found 
between  an  input  speech  utterance  and  a  reference  vocabulary 


27 


pattern,  the  utterance  is  "recognized"  by  the  T600.   It  then 
sends  to  the  user's  computer  the  appropriate  output  string 
of  characters  associated  with  the  recognized  input. 

The  T600  has  three  types  of  memory  which  the  user 
may  modify:   speech  reference  patterns,  prompt  character 
strings  and  output  character  strings.   As  noted  above  the 
speech  reference  patterns  are  formed  when  the  user  trains  the 
voice  recognizer  by  repeating  the  vocabulary  utterances  a 
number  of  times.   The  prompt  character  strings  are  input  by 
a  user  at  the  keyboard  and  are  displayed  on  the  CRT  for  each 
utterance  to  prompt  the  speaker  when  he  is  training  that 
particular  utterance.   The  output  character  string,  also 
initially  entered  via  the  keyboard,  is  the  actual  output 
sent  to  the  user  * s  computer  over  a  communications  interface 
by  the  T600  when  an  utterance  is  recognized.   The  recognized 
utterances  are  sent  exactly  as  if  they  had  been  typed  in  at 
the  keyboard.   When  spoken  each  of  the  utterances  is  echoed 
on  the  CRT  as  a  visual  display  for  the  operator. 

The  speaker  uses  a  noise-cancelling  microphone  plugged 
into  the  remote  voice  input  unit  while  speaking  to  the  T600. 
This  microphone  allows  the  T600  to  be  used  in  noisy  areas. 
The  placement  of  the  microphone  by  the  speaker  is  very  impor- 
tant during  both  the  training  and  recognition  phases  with 
the  T600.   Accurate  recognition  may  decrease  if  the  microphone 
is  moved  from  one  position  to  another  in  relation  to  the 
speaker's  mouth.   It  should  be  placed  in  front  of  the  lips 
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but  not  touching  them,  and  slightly  to  the  side  of  the 
speaker's  mouth.   The  microphone  should  just  touch  the  lower 
lip  when  the  lip  is  extended  forward  as  far  as  possible.   If 
the  microphone  slips  from  this  position  while  speaking,  it 
should  be  readjusted  before  continuing. 

Data  in  the  T600  memory  is  stored  in  the  main  terminal 
processor  unit.   In  conjunction  with  this  the  magnetic  tape 
cartridge  unit,  a  digital  tape  recorder,  is  used  to  store 
this  memory  data  on  a  tape  cartridge  and  then  to  recall  it 
from  the  cartridge  whenever  desired.   The  tape,  once  recorded, 
can  be  used  to  quickly  retrain  the  terminal  with  the  user's 
speech  patterns  and  specific  vocabulary.   This  is  very  useful 
when  the  terminal  is  used  repeatedly  by  a  number  of  different 
users . 

For  this  experiment  two  additional  pieces  of  equipment 
were  connected  in  parallel  with  the  T600  described  above.   An 
ADM  31  Data  Display  Terminal  with  print  much  smaller  than 
that  of  the  T600  was  used  so  that  the  longer  commands  input 
by  the  user  would  entirely  fit  on  a  single  line  rather  than 
"wrapping  around"  as  they  would  on  the  T600  CRT.   It  was  felt 
that  this  would  eliminate  one  possible  source  of  confusion 
for  the  test  subjects.   Additionally,  a  Miniterm  Model  1203 
was  used  in  order  to  obtain  a  hard  copy  printout  of  all  the 
voice  and  manual  input  commands.   This  was  necessary  to 
accurately  count  and  differentiate  between  the  types  of  input 
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errors.   This  will  be  discussed  at  greater  length  in  Section 
IV.   The  entire  equipment  set-up  as  used  in  the  experiment  is 
shown  in  Figure  1. 

2.   Available  Input  Modes 

The  speed  of  entry  and  number  of  errors  associated 
with  three  different  input  modes  were  to  be  evaluated  in  the 
experiment.   Each  subject  would  type  the  BLUE  player  commands 
at  the  ADM  31  terminal,  would  enter  the  same  commands  using 
the  unbuffered  voice  mode  of  the  T600  and  would  enter  the 
commands  via  the  T600's  buffered  voice  mode.   The  order  of 
the  input  modes  was  varied  from  subject  to  subject  in  order 
to  eliminate  any  bias  which  the  ordering  might  have  introduced 

In  the  typing  mode  with  WES  there  is  no  way  of  cor- 
recting any  error  once  it  is  typed  prior  to  sending  it  to  the 
game  for  execution  (i.e.,  no  backspace  or  erase).   This  is 
quite  important  since  a  single  error  will  invalidate  an  entire 
WES  command.   If  an  error  is  made  it  is  best  to  immediately 
type  a  carriage  return  (entering  the  incorrect  order) ,  and 
then  retype  the  order  correctly  and  enter  it  into  the  system. 
By  doing  this  time  is  saved  which  would  otherwise  be  wasted 
while  completely  entering  a  command  already  containing  an 
error,  and  the  possibility  of  committing  further  errors  in 
this  same  command  is  eliminated. 

The  unbuffered  voice  input  mode  to  WES  is  very 
similar  to  this.   The  T600  will  send  the  ASCII  character 
stream  associated  with  any  recognized  voice  input  to  the 
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user's  host  computer  without  the  user  being  able  to  correct 
any  "incorrectly  recognized"  spoken  input.   No  editing  is 
possible  in  the  unbuffered  voice  mode  of  operation,  and  there- 
fore, like  typing,  when  an  error  is  noted  it  is  best  to  enter 
the  command  at  that  point  and  then  reenter  it  correctly.   In 
contrast  to  this  the  T600*s  buffered  voice  mode  allows  the 
user  to  verify  his  input  stream,  make  corrections  to  it  if 
necessary  and  then  transmit  it  to  the  host  computer.   The 
T600  stores  the  utterances  in  an  internal  buffer  which  may 
be  modified  and  the  contents  of  this  buffer  are  sent  to  the 
host  in  a  "block"  when  the  user  transmits  them. 

C.   SELECTION  OF  SUBJECTS 
1.   Backgrounds 

Twelve  subjects  who  participated  on  a  voluntary  basis 
were  chosen  for  the  experiment.   Eleven  of  the  subjects  are 
military  officers  (six  Navy,  four  Air  Force  and  one  Army)  in 
paygrades  03  -  05,  and  one  is  a  civilian  professor  at  NPS. 

Ten  of  the  military  officers  are  members  of  the  Command, 

3 
Control  and  Communications  (C  )  curriculum  at  NPS  and  the 

eleventh  is  on  the  faculty.   Two  of  the  twelve  subjects 

are  female  Naval  officers. 

All  subjects  had  previously  had  at  least  a  brief 

exposure  to  WES  while  at  NPS.   However,  only  one  subject, 

the  female  faculty  member,  was  considered  to  be  experienced 

with  WES.   In  addition,  all  the  subjects  had  at  least  minimal 
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experience  with  voice  recognition  systems,  with  six  of  the 
subjects  considered  "experienced"  with  voice  systems.   This 
experience  was  established  for  four  subjects  by  participating 
in  a  six  month  controlled  voice  recognition  longitudinal 
study,  and  for  the  two  faculty  members  by  continuous  use  of 
voice  systems  over  a  prolonged  period  of  time  (more  than 
three  years  for  the  civilian  professor) .   This  breakdown  of 
six  experienced  subjects  and  six  inexperienced  with  voice 
recognition  systems  was  planned  in  order  to  determine  whether 
prior  experience  would  be  a  significant  factor  in  determining 
the  preferred  method  of  command  input  to  WES.   A  synoptic 
background  of  the  twelve  test  subjects  is  contained  in 
Appendix  C . 

2 .   Initial  Training 

Each  of  the  subjects  met  individually  with  the  author 
and  was  given  a  typing  ability  test.   This  consisted  of  a 
five  minute  typing  exercise  (similar  to  that  given  to  a  GS-2 
typist)  during  which  the  subject  was  instructed  to  type  two 
given  paragraphs  totalling  21  lines  as  quickly  and  accurately 
as  possible  without  error  correction.   A  subject's  speed  in 
words  per  minute  (wpm)  was  then  calculated  with  a  scoring 
table  approximately  using  the  formula  wpm  =  total  characters/ 
25.   A  certain  number  of  errors,  increasing  with  the  number 
of  gross  words  per  minute  typed,  was  permitted,  with  any 
errors  in  excess  of  this  number  resulting  in  .2  wpm  per  error 
subtracted  from  the  final  typing  speed. 
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The  typing  ability  test  was  given  to  determine  whether 
there  was  a  clear  cut  distinction  between  typists  and  non- 
typists  among  the  test  subjects.   Although  one  subject  typed 
below  20  wpm  and  two  subjects  were  above  4  0  wpm,  nine  of  the 
subjects  were  grouped  between  21  and  39  wpm.   Due  to  this 
close  grouping  and  the  rather  short  length  of  the  WES  com- 
mands this  difference  in  typing  speeds  was  not  considered 
important.   The  typing  test  used,  along  with  its  scoring 
matrix,  is  shown  in  Appendix  D. 

Each  of  the  subjects  next  trained  the  T600  voice 
recognition  unit  using  the  WES  vocabulary  of  Appendix  B. 
This  training  was  accomplished  by  having  the  subjects  repeat 
each  vocabulary  utterance  ten  times  while  in  the  T600  train- 
ing mode  in  order  to  optimize  the  stored  reference  patterns 
for  their  individual  speech  variations.   The  average  time 
required  to  train  the  162  utterance  vocabulary  was  94  minutes, 
with  the  shortest  time  being  6  9  minutes  and  the  longest  116 
minutes. 

Once  the  training  was  completed  each  utterance  was 
repeated  three  additional  times  while  in  the  T600's  recogni- 
tion mode  to  check  for  recognition  accuracy.   If  at  least 
two  of  each  three  vocabulary  utterances  were  correctly  recog- 
nized, the  utterance  was  considered  to  be  properly  trained. 
If  not,  that  vocabulary  word  was  then  retrained  and  again 
checked  for  accuracy.   On  the  average  each  subject  retrained 
five  utterances  (three  being  the  least  number  retrained  and 
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nine  the  highest) ,  with  phonetically  similar  expressions  such 
as  HARM/ARM,  with/list,  back/track/attack  and  dive/five  caus- 
ing the  most  difficulty. 

D.   CONDUCTING  THE  EXPERIMENT 

For  the  experiment  the  author  had  compiled  a  list  of  20 
basic  WES  commands  for  the  CUBA  scenario.   These  2  0  commands 
(Appendix  E)  totalled  272  voice  utterances  and  used  67  of  the 
162  vocabulary  utterances  considered  necessary  to  run  an 
actual  WES  war  game.   The  author  had  further  divided  these 
20  commands  into  five  shorter  groups  of  four  commands  each 
(Appendix  F) .   The  commands  in  these  five  groups  were  arranged 
so  that  each  group  would  be  of  approximately  the  same  length. 
(Those  utterances  in  Appendices  E  and  F  which  consisted  of 
more  than  one  word  are  highlighted  as  they  were  for  the  sub- 
jects during  the  experiment.) 

Each  subject  would  be  required  to  input  the  list  of  20 
commands  and  the  five  shorter  lists  of  commands  by  the  three 
methods  of  typing,  unbuffered  voice  and  buffered  voice.   The 
order  of  the  input  methods  and  the  lists  of  commands  used 
was  randomly  varied  from  subject  to  subject  to  eliminate  any 
bias.   When  inputting  the  short  lists,  whether  by  typing  or 
voice  the  subjects  were  given  a  brief  rest  between  each  of 
the  five  lists.   The  use  of  the  20  command  list  and  the  group 
of  five  lists  with  breaks  between  each  was  designed  to  see 
whether  fatigue,  frustration,  or  the  prospect  of  having  a 
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long  or  short  task  ahead  might  have  any  relevance  on  the 
results  of  the  different  entry  methods. 

The  conceptual  design  of  the  experiment  is  shown  in 
Figure  2.   This  is  a  three-factor  nested  design  with  repeated 
measures  over  the  tasks.   Each  subject  is  nested  within  only 
one  of  the  levels  of  experience. 

Once  each  subject  had  finished  training  the  T600  he  met 
at  a  later  time  with  the  author  to  conduct  the  actual 
experiment.   At  this  point  the  subjects  were  given  a  brief 
overview  of  what  they  would  be  doing  along  with  a  verbal  set 
of  instructions  (Appendix  G) .   Since  in  some  cases  it  had 
been  several  weeks  since  the  initial  voice  training  all  the 
subjects  were  given  a  copy  of  the  WES  vocabulary  in  order  to 
refresh  their  memories.   In  addition  the  subjects  were  pro- 
vided a  list  of  practice  commands  (Appendix  H)  with  which  they 
were  allowed  to  train  until  they  felt  at  ease  and  confident 
with  the  use  of  the  voice  recognition  system. 

After  each  subject  felt  satisfied  with  his  practice  the 
experiment  was  run.   The  entire  list  of  20  commands  and  the 
five  groups  of  commands,  depending  on  the  order  used,  were 
entered  into  the  WES  game  via  the  three  different  input 
methods.   While  using  the  voice  recognition  modes,  if  an 
utterance  was  misrecognized  four  consecutive  times  or  an  ab- 
normally large  number  of  times  throughout  the  experiment,  the 
author  stopped  the  clock  and  had  the  subject  retrain  that 
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utterance  rather  than  continue  to  struggle  against  the 
system.   This  was  done  on  six  occasions.   The  results  of  the 
experiment  are  contained  in  Section  IV. 
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IV.   PRESENTATION  OF  DATA 

A.   DATA  COLLECTION  TECHNIQUES 

During  the  typing  and  the  unbuffered  voice  modes  of  the 
experiment  the  Miniterm  was  used  to  keep  a  typescript  of  all 
commands  entered  by  the  subjects  and  the  responses  of  the 
WES  game.   During  the  buffered  voice  mode  the  Miniterm  was 
not  used  since  the  only  commands  which  would  have  been  printed 
were  those  already  corrected  by  the  subject  and  sent  contain- 
ing no  errors  from  the  internal  buffer.   Instead  the  author 
manually  recorded  errors  during  this  phase. 

The  following  measures  of  performance  were  recorded  during 
all  the  trials:   1)  the  time  required  to  complete  a  specific 
scenario,  and  2)  the  number  of  input  command  errors.   Input 
errors  were  divided  into  two  types,  recognition  errors  and 
operator  errors.   Recognition  errors  were  those  encountered 
when  the  T600  "thought"  the  subject  said  one  thing  but  he 
had  actually  said  another.   This  type  of  error  was  not 
applicable  to  the  typing  mode.   An  operator  error  was  any 
other  type  of  error  committed  which  was  not  attributable  to 
the  T600  (e.g.,  a  typing  mistake,  the  operator  forgetting 
to  say  "space"  after  a  number,  the  operator  saying  "for" 
(and  having  it  recognized  as  "4")  rather  than  "for  the," 
etc. )  . 
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In  analyzing  the  data  the  author  was  interested  in  the 
actual  number  of  errors  committed.   Therefore  every  single 
error  was  counted  as  a  separate  error.   For  example,  if  the 
subject  made  one  typing  error,  or  had  one  voice  utterance 
misrecognized  during  a  command,  this  was  counted  as  one  error 
However,  if  the  subject  committed  two  typing  errors  in  the 
same  command  before  entering  the  command,  this  was  counted  as 
two  errors  although  they  only  invalidated  a  single  command. 

B.   GENERAL  RESULTS 

As  noted  earlier,  each  set  of  20  voice  commands  contained 
272  voice  utterances.   Each  subject  was  required  to  input  the 
total  20  commands  four  different  times  by  voice  (i.e.,  the 
list  of  20  commands  by  buffered  and  unbuffered  voice,  and 
the  five  groups  of  four  commands  in  the  same  manner) .   There- 
fore, if  no  voice  errors  had  been  committed,  the  twelve  sub- 
jects would  have  inputted  a  total  of  13,056  voice  utterances 
during  the  experiment.   However,  the  occurrence  of  both 
recognition  and  operator  errors,  and  having  to  reenter  the 
commands  which  contained  these  errors,  resulted  in  a  some- 
what greater  number  of  voice  utterances  for  the  experiment. 
(The  author  did  not  physically  count  this  total  number.) 
There  were  982  recognition  errors  recorded  during  the 
experiment. 

After  analyzing  the  typescript  from  the  unbuffered  voice 
portion  of  the  experiment,  it  was  found  that  of  the  67 
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utterances  used  to  form  the  20  WES  commands,  46  of  these 
utterances  had  been  misrecognized  at  least  once  for  some  other 
vocabulary  utterance.   Twenty-one  of  the  utterances  were 
never  misrecognized  by  the  T600.   In  the  buffered  mode  only 
the  numbers  of  recognition  errors  were  recorded  rather  than 
the  misrecognized  words  since  the  author  was  not  able  to 
keep  an  accurate  record  of  these. 

There  were  more  total  errors  with  each  of  the  voice  input 
modes  than  with  the  typing  mode.   The  following  data  were 
found  when  looking  at  total  number  of  errors  (recognition 
errors  +  operator  errors) :   typing,  169  total  errors;  buffered 
voice,  542;  and  unbuffered  voice,  701.   These  figures  show 
that  the  typing  mode  had  68.8  percent  fewer  total  errors  than 
did  the  buffered  voice  mode,  and  75.9  percent  fewer  errors 
than  the  unbuffered  voice  mode. 

All  of  the  subjects,  regardless  of  typing  ability,  had 
been  inputting  data  via  a  keyboard  for  at  least  five  quarters 
while  at  NPS,  while  only  six  were  considered  experienced  in 
voice  entry.   In  addition,  subjects  seemed  to  try  to  be  quite 
precise  while  typing  at  the  keyboard  where  they  had  total 
control  over  any  errors  committed  as  opposed  to  voice  input 
where  the  T600  might  not  recognize  their  utterance. 

As  far  as  time  was  concerned,  the  total  time  required 
for  all  the  subjects'  typing  inputs  was  254.35  minutes, 
286.17  minutes  for  buffered  voice  and  585.7  minutes  for  un- 
buffered voice.   Therefore  typing  was  11.1  percent  faster 
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than  buffered  voice,  and  56.6  percent  faster  than  unbuffered 
voice  input. 

C.   RESULTS  FOR  SCENARIO  TIMES 

Table  1  shows  the  time  in  minutes  required  for  each  sub- 
ject to  input  the  list  of  20  commands  by  the  three  entry 
methods,  and  Table  2  shows  this  data  for  the  five  groups  of 
commands..  An  analysis  of  variance  [14]  was  performed  on  this 
time  data  and  Table  3  gives  the  statistical  results.   (The 
task  of  inputting  either  the  20  commands  or  the  five  groups 
of  commands  is  hereafter  referred  to  as  the  Task  Type.) 

Table  3  shows  that  there  was  a  statistically  significant 
difference  (at  the  a  =  .10  level)  in  time  for  experience 
level,  as  can  be  seen  in  Figure  3.   (An  a  level  of  .10,  for 
example,  means  that  there  is  only  a  10  percent  chance  or 
less  that  it  is  wrong  to  say  there  was  a  significant  differ- 
ence in  certain  conditions.)   The  experienced  subjects  were 
able  to  input  the  commands  faster  via  all  three  entry  methods, 
and  most  noticeably  by  unbuffered  voice  where  the  average 
time  climbed  most  steeply  for  the  inexperienced  subjects. 

Table  3  also  shows  that  there  was  a  significant  difference 
(a  =  .01)  in  time  for  entry  method.   A  range  test  [15]  showed 
that  there  was  a  significant  improvement  in  time  with  both 
typing  and  buffered  voice  over  unbuffered  voice,  and  that 
there  was  no  difference  between  typing  and  buffered  voice  as 
far  as  time  is  concerned.   These  results  are  shown  in  Figure 
4. 
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Table  1.   Time  for  20  Commands 


UNBUFFERED 

BUFFERED 

SUBJECT 

TYPE 
11.97 

VOICE 

VOICE 

1 

20.05 

12.80 

2 

5.55 

20.22 

16.62 

3 

7.42 

12.35 

7.52 

4 

20.37 

28.77 

11.72 

5 

9.33 

15.00 

10.43 

6 

9.43 

6.80 

7.82 

7 

15.40 

40.32 

14.40 

8 

8.67 

76.88 

15.82 

9 

9.47 

18.80 

9.40 

10 

12.67 

20.57 

13.13 

11 

9.32 

11.78 

10.00 

12 

11.15 

36.40 

10.80 
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Table  2.   Time  for  Five  Groups 
of  4  Commands  Each 


UNBUFFERED 

BUFFERED 

SUBJECT 

TYPE 
10.27 

VOICE 

VOICE 

1 

22.40 

11.73 

2 

8.80 

20.97 

12.88 

3 

9.65 

10.32 

9.23 

4 

14.88 

18.03 

9.62 

5 

8.10 

11.78 

9.05 

6 

10.27 

10.48 

8.22 

7 

11.35 

44.23 

16.98 

8 

8.07 

56.18 

17.05 

9 

9.42 

23.22 

11.28 

10 

12.95 

15.20 

10.57 

11 

8.52 

21.85 

14.85 

12 

11.32 

23.10 

13.85 
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Table  3 .   Analysis  of  Variance  for  Time 


SOURCE 

Between  Subjects 

EL  (experience  level) 
Error, 

Within  Subjects 
EM  (entry  method) 
TT  (task  type) 
EL  x  EM 
EL  x  TT 
EM  x  TT 
EL  x  EM  x  TT 
Error, 
Error2 
Error. 


df 


MS 


11 

1 

700.1282 

3.8287* 

10 

182.8630 

60 

2 

1392.5246 

10.9110** 

1 

15.0152 

.8093 

2 

432.8107 

3.3912* 

1 

.0612 

.0033 

2 

18.8148 

1.312 

2 

3.0497 

.2126 

20 

127.6252 

10 

18.5524 

20 

14.34 

*  p<.10 
**p<.01 

df:   degrees  of  freedom 
MS :   Mean  Square 
F:    F  test  ratio 
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There  was  also  significant  experience  level-by-entry  method 
interaction  shown  in  Table  3.   This  is  shown  in  both  Figures 
3  and  4  and  was  due  mainly  to  the  effect  of  the  inexperienced 
subjects  with  unbuffered  voice  where  the  average  time  increased 
much  more  quickly  than  it  did  for  the  experienced  subjects. 

Table  3  further  shows  that  there  was  no  difference  in  the 
two  task  types  with  respect  to  time.  There  was  also  no  other 
significant  interaction  shown. 

D.   RESULTS  FOR  INPUT  ERRORS 

1 .   Recognition  Errors 

The  total  number  of  recognition  errors  for  each  sub- 
ject in  the  two  voice  entry  modes  for  20  commands  is  given 
in  Table  4.   Table  5  shows  this  data  for  the  five  groups  of 
commands.   The  results  of  the  analysis  of  variance  for  this 
data  are  given  in  Table  6 . 

Table  6  shows  that  there  was  no  significant  difference  in 
either  experience  level,  entry  method  or  task  type  with  res- 
pect to  recognition  errors.   Although  it  is  not  surprising 
that  the  entry  method  and  the  task  type  make  no  difference 
as  far  as  recognition  errors  are  concerned,  it  is  somewhat 
surprising  that  experience  level  does  not.   The  author  would 
have  thought  the  opposite  to  be  true,  with  experienced  sub- 
jects having  significantly  fewer  recognition  errors. 

Table  6  does,  however,  show  a  significant  (a  =  .05)  inter- 
action between  entry  method  and  task  type  as  depicted  in 
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Table  4 .   Recognition  Errors  for 
20  Commands 


UNBUFFERED 

BUFFERED 

SUBJECT 

VOICE 

VOICE 

1 

19 

18 

2 

24 

62 

3 

5 

2 

4 

28 

8 

5 

16 

8 

6 

2 

3 

7 

69 

31 

8 

81 

26 

9 

9 

5 

10 

6 

6 

11 

5 

16 

12 

29 

20 
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Table  5.   Recognition  Errors  for 
Five  Groups  of  4  Commands  Each 


UNBUFFERED 

BUFFERED 

SUBJECT 

VOICE 

VOICE 

1 

25 

14 

2 

15 

43 

3 

4 

7 

4 

13 

6 

5 

13 

15 

6 

6 

3 

7 

52 

33 

8 

70 

24 

9 

17 

15 

10 

5 

7 

11 

16 

43 

12 

14 

24 

50 


Table  6.   Analysis  of  Variance  for 
Recognition  Errors 


SOURCE 


df 


MS 


Between  Subjects  11 

EL  (experience  level)     1 
Error  ,  10 


1452 
946.9916 


1.5332 


Within  Subjects 
EM  (entry  method) 
TT  (task  type) 
EL  x  EM 
EL  x  TT 
EM  x  TT 
EL  x  EM  x  TT 
Error, 
Error2 
Error- 


36 

1 

225.3333 

.5114 

1 

4.0833 

.0510 

1 

420.0833 

.9535 

1 

48 

.60 

1 

108 

5.1695 

1 

80.0834 

3.8332 

10 

440.5583 

10 

79.9916 

10 

20.8916 

** 


*  P<.05 
**P<.10 

df:   degrees  of  freedom 
MS:   Mean  Square 
F:    F  test  ratio 
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Figure  5.   Although  the  average  number  of  recognition  errors 
was  greater  for  20  commands  than  for  the  five  groups  in  un- 
buffered voice,  the  opposite  was  true  for  buffered  voice. 
There  is  also  a  significant  three-way  interaction  shown  in 
Table  6  between  experience  level,  entry  method  and  task  type. 
This  interaction  is  shown  in  Figure  6. 
2 .   Operator  Errors 

Operator  errors  were  all  errors  other  than  those 
caused  by  the  T600  voice  recognition  unit.   This  included 
such  things  as  typing  and  spelling  errors  in  the  typing  mode, 
and  basically  forgetting  the  various  ground  rules,  and  there- 
fore causing  mistakes,  while  using  the  voice  modes.   Table  7 
shows  the  number  of  operator  errors  committed  while  inputting 
the  list  of  20  commands,  and  Table  8  gives  this  information 
for  the  groups  of  commands.   Table  9  shows  the  results  of  the 
ANOVA  performed  on  this  data. 

Table  9  shows  a  statistically  significant  difference 
at  the  a  =  .05  level  in  operator  errors  for  entry  method.   A 
range  test  showed  a  significant  decrease  in  operator  errors 
for  buffered  voice  as  compared  to  both  unbuffered  voice  and 
typing.   The  range  test  showed  no  difference  between  the 
typing  and  unbuffered  voice  modes  with  respect  to  operator 
errors.   This  is  shown  in  Figure  7  where  buffered  voice  has 
fewer  operator  errors  than  the  other  input  methods  for  both 
experience  levels. 
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Figure   6.      Average  Number  of   Recognition  Errors 
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Table  7.   Operator  Errors  for 
20  Commands 


UNBUFFERED 

BUFFERED 

SUBJECT 

TYPE 
13 

VOICE 

VOICE 

1 

10 

8 

2 

1 

9 

6 

3 

4 

8 

4 

4 

6 

15 

4 

5 

4 

4 

3 

6 

10 

1 

3 

7 

7 

6 

1 

8 

9 

11 

9 

9 

5 

4 

2 

10 

4 

2 

3 

11 

3 

6 

4 

12 

5 

3 

2 
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Table  8.   Operator  Errors  for 
Five  Groups  of  4  Commands  Each 


UNBUFFERED 

SUBJECT 

TYPE 
11 

VOICE 

1 

7 

2 

10 

12 

3 

13 

8 

4 

6 

5 

5 

3 

2 

6 

16 

10 

7 

4 

3 

8 

9 

8 

9 

4 

5 

10 

9 

3 

11 

7 

15 

12 

6 

1 

BUFFERED 
VOICE 

6 

7 

5 

6 

1 

3 

5 

5 

4 

2 

5 

5 
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Table  9.   Analysis  of  Variance  for 
Operator  Errors 


SOURCE 


df 


MS 


Between  Subjects  11 

EL  (experience  level) 
Error, 


Within  Subjects 
EM  (entry  method) 
TT  (task  type) 
EL  x  EM 
EL  x  TT 
EM  x  TT 
EL  x  EM  x  TT 
Error, 
Error- 
Error- 


1 

46.7222 

1.8772 

10 

24.8888 

60 

2 

52.0972 

5.3311* 

1 

14.2222 

1.0314 

2 

3.3472 

.3425 

1 

.2223 

.0161 

2 

8.5972 

1.3147 

2 

5.8472 

.8942 

20 

9.7722 

10 

13.7888 

20 

6.5388 

*p<«05 

df:   degrees  of  freedom 
MS :   Mean  Square 
F:    F  test  ratio 
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Figure  7.   Average  Number  of  Operator  Errors 
for  Different  Experience  Levels 
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Table  9  shows  that  there  is  no  significant  difference 
in  operator  errors  over  either  experience  level  or  task  type. 
There  are  also  no  significant  interactions  shown  in  the 
table. 

3 .   Total  Errors 

The  total  errors  are  the  sum  of  the  recognition  and 
operator  errors.   The  total  number  of  errors  for  each  subject 
is  given  in  Table  10  for  the  task  of  entering  2  0  commands,  and 
in  Table  11  for  the  groups  of  commands.   As  for  the  other 
types  of  errors  an  analysis  of  variance  was  performed  on  this 
data,  with  the  results  presented  in  Table  12. 

The  results  of  the  ANOVA  show  a  significant  differ- 
ence in  total  errors  for  entry  method.   A  range  test  showed 
a  significant  decrease  in  total  errors  for  the  typing  mode 
when  compared  with  both  unbuffered  and  buffered  voice.   There 
was  no  significant  difference  between  the  two  different  voice 
input  modes.   This  result  is  shown  in  Figure  8.   IT  MUST  BE 
REMEMBERED,  HOWEVER,  THAT  THE  TYPING  MODE  DID  NOT  INCLUDE 
RECOGNITION  ERRORS,  WHEREAS  THE  TWO  VOICE  MODES  DID.   THERE- 
FORE, FOR  THE  VOICE  MODES  TOTAL  ERRORS  ARE  THE  SUM  OF  OPERATOR 
AND  RECOGNITION  ERRORS,  WHILE  FOR  TYPING  TOTAL  ERRORS  ARE  THE 
SAME  AS  OPERATOR  ERRORS.   THIS  CAN  BE  SEEN  BY  COMPARING  THE 
CURVES  FOR  TYPING  IN  FIGURES  7  AND  8  WHICH  SHOW  TYPING  WITH 
THE  EXACT  SAME  TREND  BECAUSE  THERE  COULD  BE  NO  VOICE  RECOGNI- 
TION  ERRORS  UNDER  THE  TYPING  METHOD. 
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Table  10.   Total  Errors  for 
20  Commands 


UNBUFFERED 

BUFFERED 

SUBJECT 

TYPE 
13 

VOICE 

VOICE 

1 

29 

26 

2 

1 

33 

68 

3 

4 

13 

6 

4 

6 

43 

12 

5 

4 

20 

11 

6 

10 

3 

6 

7 

7 

75 

32 

8 

9 

92 

35 

9 

5 

13 

7 

10 

4 

8 

9 

11 

3 

11 

20 

12 

5 

32 

22 

60 


Table  11.   Total  Errors  for 
Five  Groups  of  4  Commands  Each 


SUBJECT 

1 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 
12 


UNBUFFERED 

BUFFERED 

TYPE 

VOICE 

VOICE 

11 

32 

20 

10 

27 

50 

13 

12 

12 

6 

18 

12 

3 

15 

16 

16 

16 

6 

4 

55 

38 

9 

78 

29 

4 

22 

19 

9 

8 

9 

7 

31 

48 

6 

15 

29 
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Table  12.   Analysis  of  Variance 
for  Total  Errors 


SOURCE 


df 


*p<.01 

df:   degrees  of  freedom 
MS :   Mean  Square 
F:    F  test  ratio 


MS 


Between  Subjects 

11 

EL  (experience 

level) 

1 

589.3889 

.7749 

Error, 
b 

10 

760.5722 

Within  Subjects 

60 

EM  (entry  method) 

2 

3107.1805 

7.6051* 

TT  (task  type) 

1 

4.5 

.0524 

EL  x  EM 

2 

442.1805 

1.0822 

EL  x  TT 

1 

26.8888 

.3136 

EM  x  TT 

2 

75.5416 

1.8751 

EL  x  EM  x  TT 

2 

66.2639 

1.6448 

Error-, 

20 

408.5638 

Error» 

10 

85.7277 

Error,, 

20 

40.2861 
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Figure  8 .   Average  Number  of  Total  Errors 
for  Different  Experience  Levels 
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Table  12  shows  that  there  is  no  significant  difference 
in  total  errors  over  either  experience  level  or  task  type. 
In  addition,  there  are  no  significant  interactions  in  the 
area  of  total  errors. 
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V.   CONCLUSIONS  AND  RECOMMENDATIONS 

A.   EXPERIMENTAL  CONCLUSIONS 

Based  on  the  results  of  this  experiment,  twelve  test 
subjects  were  able  to  input  WES  commands  to  the  war  game 
faster  and  with  fewer  total  errors  using  the  manual  typing 
input  mode  than  with  two  voice  input  modes.   Experienced 
voice  subjects  input  the  commands  faster  than  the  inexperienced 
subjects,  but  experience  level  made  no  difference  as  far  as 
the  total  number  of  errors  committed  was  concerned.   Typing 
was  significantly  better  as  far  as  total  errors,  but  there 
was  no  statistical  difference  between  typing  and  buffered 
voice  modes  as  far  as  time  was  concerned.   Finally,  for  time 
and  total  errors,  it  made  no  difference  which  of  the  two  task 
types  was  being  performed. 

The  results  suggest  that  manual  input  is  certainly  supe- 
rior to  unbuffered  voice,  and  in  some  respects  to  buffered 
voice  input  in  this  experiment.   However,  the  author  feels 
that  this  must  be  qualified  by  looking  at  the  unique  situa- 
tion in  which  the  input  methods  were  being  used.   WES  com- 
mands are  very  formatted  and  must  be  entered  with  no  errors. 
This  requirement  caused  many  commands  to  be  rejected  and 
resulted  in  the  definite  infeasibility  of  using  unbuffered 
voice  input  with  WES.   It  simply  took  too  long  and  resulted 
in  too  many  errors.   The  buffered  voice  mode  held  its  own 
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with  typing  when  considering  input  time,  and  it  was  actually 
better  than  typing  for  operator  errors. 

In  a  task  such  as  running  WES ,  when  the  goal  must  be 
perfection  in  entering  all  player  commands  in  order  to 
actually  play  the  game,  if  the  time  required  for  two  differ- 
ent input  methods  is  the  same,  then  it  appears  that  their 
error  rates  are  insignificant.   In  this  experiment  there  was 
no  statistical  difference  in  time  for  the  typing  and  buffered 
voice  input  methods,  so  the  fact  that  buffered  voice  had  more 
total  errors  really  makes  no  difference.   The  lists  of  com- 
mands were  input  and  accepted  by  WES  in  the  same  amount  of 
time  regardless  of  errors. 

There  are  also  possible  intangible  benefits  associated 
with  the  use  of  voice  input  to  a  computer,  whatever  its  pur- 
pose might  be.   One  such  benefit  might  be  the  ability  of 
supervisors  or  commanders  to  hear  what  is  being  told  to  or 
asked  of  a  computer  while  they  are  still  engaged  in  other 
activities.   This  would  eliminate  several  people  leaning 
over  the  shoulder  of  the  operator  trying  to  see  what  he  is 
typing  into  the  computer,  allowing  the  operator  to  perform 
his  job  more  easily  and  probably  increasing  the  total  effi- 
ciency in  the  work  area. 

B.   RECOMMENDATIONS  FOR  FURTHER  STUDY 

Voice  recognition,  although  very  promising  in  many  fields, 
certainly  is  not  the  panacea  in  all  areas  of  input  to 
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computers.   This  can  be  seen  by  the  unbuffered  voice  results 
with  WES.   This  should  not,  however,  slow  down  the  research 
being  done  in  the  field  of  voice  recognition.   Studies  cited 
earlier  point  out  very  promising  uses  of  voice  recognition. 
The  author  believes  that  further  research  should  be  done, 
using  the  buffered  voice  mode,  during  WES  games  to  test  its 

validity  in  actual  use.   This  could  be  done  quite  easily  as 

3 
thesis  research  work  at  NPS ,  in  the  C   laboratory  course  at 

NPS,  or  in  conjunction  with  scheduled  war  games  involving 
NPS,  CINCPACFLT  and  NOSC . 

In  this  experiment  the  subjects  were  divided  into  expe- 
rienced and  inexperienced  groups  as  far  as  voice  recognition 
systems  were  concerned.   However,  the  fact  that  the  subjects 
were  not  experienced  with  WES  was  never  taken  into  account. 
Another  possible  experimental  factor  might  be  to  compare  the 
results  of  experienced  and  inexperienced  WES  users.   Although 
increasing  the  variables  like  this  would  make  it  more  diffi- 
cult to  find  the  required  number  of  subjects,  this  could  be 

3 
done  at  NPS  in  the  C   curriculum  where  the  students  take 

almost  all  of  the  same  classes  for  six  quarters. 

Further  research  also  should  be  done  in  the  NPS  RSM, 

perhaps  in  conjunction  with  the  WES  games  proposed  above,  to 

study  the  effects  of  background  and  ambient  noise  on  the 

reliability  of  the  voice  recognition  equipment.   There  will 

surely  be  this  noise  problem  in  any  operational  use  of  voice 

equipment  in  a  command  center,  CIC  or  aircraft,  and  this 
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could  easily  be  simulated  by  introducing  noise  while  using 
the  equipment  at  NPS . 

Research  into  the  possible  uses  of  voice  recognition 
equipment  in  aircraft,  intelligence,  war  gaming  and  other 
operational  uses  is  presently  ongoing  at  NPS.   These  efforts 
will  result  in  much  new  information  on  the  uses  and  drawbacks 
of  automatic  voice  recognition.   Truly  operational,  rather 
than  merely  scholarly  and  scientific  study  in  this  field 
must  be  continued  if  we  are  to  reap  any  benefits  from  this 

new  technology  available  today.   This  should  be  an  ongoing 

3 
endeavor  at  NPS,  and  in  the  C   curriculum  particularly  where 

there  is  such  promise  and  demand  for  this  type  of  technology 

today  and  in  the  future . 
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APPENDIX  A 


VOICE  STUDIES  AT  NPS 


This  thesis  is  one  of  several  voice  recognition  research 

projects  conducted  for  Professor  G.  K.  Poock  at  NPS  over  the 

last  several  years.   The  complete  list,  in  addition  to  this 

thesis,  includes: 

Armstrong,  J.  W. ,  The  Effects  Of  Concurrent  Motor  Tasking  On 
Performance  Of  A  Voice  Recognition  System,  Masters  Thesis, 
Naval  Postgraduate  School,  Monterey,  1980. 

Batchellor,  M.  P.,  Investigation  Of  Parameters  Affecting  Voice 
Recognition  Systems  In  C^  Systems,  Masters  Thesis,  Naval 
Postgraduate  School,  Monterey,  1981. 

Bragaw,  P.  H.,  Investigation  Of  Voice  Input  For  Constructing 
Joint  Chiefs  Of  Staff  Emergency  Action  Messages,  Masters 
Thesis,  Naval  Postgraduate  School,  Monterey,  19  81. 

Jay,  G.  T.,  An  Experiment  In  Voice  Data  Entry  for  Imagery 
Intelligence  Reporting,  Masters  Thesis,  Naval  Postgraduate 
School,  Monterey,  1981. 

Naval  Postgraduate  School  Report  NPS54-80-010 ,  The  Effects 
Of  Certain  Background  Noises  On  The  Performance  Of  A  Voice 
Recognition  System,  by  R.  Elster,  September  1980. 

Naval  Postgraduate  School  Report  NPS55-80-016 ,  Experiments 
With  Voice  Input  For  Command  And  Control:  Using  Voice  Input 
To  Operate  A  Distributed  Computer  Network,  by  G.  K.  Poock, 
April  1980. 

Naval  Postgraduate  School  Report  NPS55-81-003 ,  Examination  Of 
Voice  Recognition  System  To  Function  In  A  Bilingual  Mode,  by 
D.  E.  Neil  and  T.  Andreason,  February  1981. 

Taggart,  J.  L.  and  Wolfe,  C.  D.,  Speech  Recognition  As  An 
Input  Medium  For  Pref light  In  The  P3C  Aircraft,  Masters 
Thesis,  Naval  Postgraduate  School,  Monterey,  1981. 
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APPENDIX  B 


WES  VOCABULARY 


A.   BASIC  WES  WORDS 

one 

three 

five 

seven 

nine 


two 

four 

six 

eight 

zero 


e 

all 

at 

back 

bearing 

blue 

cancel 

course 

degrees 

designate 

dive 

east 

enemy 

execute 

find  distance  from 


air 

altitude 

attack 

barrier 

bingo 

by 

carriage  return 

cover 

delay 

distance 

drop 

end 

envelope 

exsup 

fire 
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fire  a 

forces 

go 

heading 

if  attacked 

kill  word 

launch 

lay  a  minefield  from 

maneuver  delay 

minefield 

name 

north 

of 

on 

orange 

other 

pass  control  of 

place  a  circle 

player 

point 

pounds  from 

probability  of  detection 

refuel 

self 

sensor  delay 

space 


for 

friendly 

guide 

help 

kill  line 

label 

lay  a  barrier  from 

list 

map 

minutes 

neutral 

now 

off 

on  contact 

orders 

own 

place 

place  a  marker 

plot 

position 

probability 

proceed 

report 

send  it 

south 

speed 
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station 

surface 

time 

track 

using 

what  is  the 


submarine 

target 

to 

unknown 

west 

with 


B.   SCENARIO  SPECIFIC  WES  WORDS 

A181 

A183 

A6E2 

ARM 

ASROC 

BERKELEY 

BPS14 

CBU24 

EA3 

ENTERPRISE 

F14B 

for  ENTERPRISE 

G554 

Harpoon 

Maverick 

MK4  9 

MK83 


A182 

A6E1 

ALR59 

ASMD 

AWG9 

Bluel 

BQQ3 

E2C 

EA6B 

ESM 

for  BERKELEY 

for  STURGEON 

HARM 

KA6D 

MK4  6 

MK82 

MK84 
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Phoenix 

RF18B 

Sidewinder 

SLQ32 

Sonobuoy  Passive 

SPN43 

SPS40 

SPS49 

STURGEON 

Tomahawk 

Walleye2 


Redeye 

Sea  Sparrow 

SLQ17 

Sonobuoy  Active 

Sparrow 

SPS10 

SPS48 

SQS23 

Tartar2 

Walleye 

WLR6 
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APPENDIX  C 

TEST  SUBJECTS 

'  BACKGROUNDS 

Subject 

Service 

Sex 

Position 

Voice  Experience 

(wpm) 
Typing  Ability 

1 

USAF 

M 

student 

experienced 

32 

2 

USN 

F 

student 

experienced 

59 

3 

USAF 

M 

student 

experienced 

46 

4 

USN 

M 

student 

experienced 

17 

5 

USN 

F 

faculty 

extensive 

34 

6 

Civ 

M 

faculty 

extensive 

39 

7 

USN 

M 

student 

minimal 

21 

8 

USAF 

M 

student 

minimal 

39 

9 

USN 

M 

student 

minimal 

38 

10 

USAF 

M 

student 

minimal 

37 

11 

USA 

M 

student 

minimal 

37 

12 

USN 

M 

student 

minimal 

26 
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APPENDIX  D 


TYPING  ABILITY  TEST 


Because  they  have  often  learned  to  know  types  of  archi- 
tecture by  decoration,  casual  observers  sometimes  fail  to 
realize  that  the  significant  part  of  a  structure  is  not  the 
ornamentation  but  the  body  itself.   Architecture,  because  of 
its  close  contact  with  human  lives,  is  peculiarly  and  in- 
timately governed  by  climate.   For  instance,  a  home  built  for 
comfort  in  the  cold  and  snow  of  the  northern  areas  of  this 
country  would  be  unbearably  warm  in  a  country  with  weather 
such  as  that  of  Cuba.   A  Cuban  house,  with  its  open  court, 
would  prove  impossible  to  heat  in  a  northern  winter. 

Since  the  purpose  of  architecture  is  the  construction  of 
shelters  in  which  human  beings  may  carry  on  their  numerous 
activities,  the  designer  must  consider  not  only  climatic  con- 
ditions, but  also  the  function  of  a  building.   Thus,  although 
the  climate  of  a  certain  locality  requires  that  an  auditorium 
and  a  hospital  have  several  features  in  common,  the  purposes 
for  which  they  will  be  used  demand  some  difference  in  struc- 
ture.  For  centuries  builders  have  first  complied  with  these 
two  requirements  and  later  added  whatever  ornamentation  they 
wished.   Logically,  we  should  see  as  more  additions,  not  as 
basic  parts,  the  details  by  which  we  identify  architecture. 
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Line  Number 

1 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 


wpm  (errors  allowed) 
1st        2nd 
typing  of  the  exercise 
(5  minutes  maximum) 


2(  ) 

5(  ) 

7(  ) 

9(  ) 

12  ( 

14  ( 

16  ( 

18  ( 

21( 

23( 

26( 

28( 

30( 

33( 

35  ( 

38( 

40(3 

42(4 

44(5 

47(6 

49(6 


52(7) 

54(7) 

56(8) 

59(8) 

61(9) 

64(9) 

66(10) 

68(10) 

71(11) 

73(11) 

76(12) 

78(12) 

80(12) 
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APPENDIX  E 


LIST  OF  2  0  WES  COMMANDS 


1.  For  Enterprise  launch  2  F14B  course  090  altitude 
15000  bingo  999  name  1F14B. 

2.  For  Berkeley  attack  enemy  surface  on  contact  using  G554 

3.  Find  distance  from  Enterprise  to  42N  57W. 

4.  For  Sturgeon  course  090  speed  15. 

5.  Place  a  circle  Enterprise  150  time  15  999. 

6.  For  Sturgeon  report  all  surface  using  BQQ3. 

7 .  For  Berkeley  fire  a  harpoon  target  enemy  surface 
sensor  delay  2  heading  120. 

8.  Pass  control  of  1F14B  to  Bluel. 

9.  For  1F14B  lay  a  minefield  from  26N  42W  bearing 
135  distance  10  using  MK82. 

10.  Place  a  marker  57N  71W  time  23  300. 

11.  For  1F14B  proceed  course  215  distance  115. 

12.  For  Berkeley  station  bearing  000  distance  3 
guide  Enterprise. 

13 .  For  Sturgeon  attack  enemy  submarine  on  contact 
using  MK48. 

14.  For  1F14B  altitude  20000  speed  600  course  090. 

15.  For  Enterprise  report  all  air  using  SPS49 
time  00  999. 
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16.  For  1F14B  attack  enemy  air  on  contact  using  Phoenix, 

17.  For  Enterprise  launch  1  KA6D  course  000 
altitude  10000  bingo  120  name  1KA6D. 

18.  Plot  all  surface  Enterprise  100. 

19.  For  Berkeley  report  enemy  forces  using  SLQ32 
time  00  120. 

20.  For  1F14B  refuel  6000  pounds  from  1KA6D. 
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APPENDIX  F 


FIVE  LISTS  OF  FOUR  WES  COMMANDS 


I. 

1.  For  Berkeley  attack  enemy  surface  on  contact  using  G554 

2.  Find  distance  from  Enterprise  to  42N  57W. 

3 .  For  Berkeley  fire  a  harpoon  target  enemy  surface 
sensor  delay  2  heading  120. 

4.  Plot  all  surface  Enterprise  100. 

II. 

1.  For  Sturgeon  course  090  speed  15. 

2.  Place  a  circle  Enterprise  150  time  15  999. 

3.  For  Enterprise  launch  2  F14B  course  090 
altitude  15000  bingo  999  name  1F14B. 

4.  For  Berkeley  report  enemy  forces  using  SLQ32 
time  00  120. 

III. 

1.  Pass  control  of  1F14B  to  Bluel. 

2.  Place  a  marker  57N  71W  time  23  300. 

3.  For  Berkeley  station  bearing  000  distance  3 
guide  Enterprise. 

4.  For  Enterprise  launch  1  KA6D  course  000 
altitude  10000  bingo  120  name  1KA6D. 
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IV. 

1.  For  Sturgeon  report  all  surface  using  BQQ3 . 

2.  For  1F14B  proceed  course  215  distance  115. 

3 .  For  Sturgeon  attack  enemy  submarine  on  contact 
using  MK48. 

4.  For  1F14B  attack  enemy  air  on  contact  using  Phoenix 

V. 

1.  For  1F14B  lay  a  minefield  from  26N  42W 
bearing  135  distance  10  using  MK82. 

2.  For  Enterprise  report  all  air  using  SPS49 
time  00  999. 

3.  For  1KA6D  altitude  20000  speed  600  course  090. 

4.  For  1F14B  refuel  6000  pounds  from  1KA6D. 
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APPENDIX  G 

INSTRUCTIONS  FOR  SUBJECTS 

You  will  be  inputting  a  set  of  20  commands  and  five  sets 
of  four  commands  each  to  the  WES  game  by  typing,  unbuf- 
fered and  buffered  voice. 

If  you  make  a  mistake  in  either  typing  or  unbuffered  modes, 
carriage  return  right  away  to  save  time  since  it  can ' t  be 
corrected.   Then  reenter  the  command  correctly. 
In  the  buffered  mode  you  can  use  kill  word  or  kill  line 
to  make  changes  before  entering  your  commands. 
Input  the  commands  as  quickly  as  possible  since  you  are 
being  timed,  but  they  must  also  be  100  percent  accurate 
and  accepted  by  WES . 

Remember  to  input  a  "space"  after  numbers  you  enter.   All 
words  automatically  have  a  space  with  them. 
Remember  that  the  words  "for"  and  "to"  were  trained  as 
"for  the"  and  "to  the"  to  differentiate  them  from  the 
numbers  4  and  2.   If  you  forget  "the,"  the  utterance  will 
be  recognized  as  the  number. 

All  phrases  which  were  trained  as  a  single  utterance 
(e.g.,  pass  control  of)  are  highlighted  in  yellow  so  you 
won't  have  to  try  to  remember  the  phrases.   Remember  to 
speak  them  as  a  single  utterance. 
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Ensure  the  microphone  is  correctly  positioned  and  if  it 
moves  stop  and  reposition  it. 

The  green  READY  light  must  be  on  for  the  T6  00  to  accept 
your  utterance.   Allow  a  short  pause  between  each  utter- 
ance for  it  to  come  back  on. 

Use  of  a  forceful  tone  of  voice  produces  the  best  results, 
and  try  not  to  draw  out  the  utterance  by  a  breathing  noise 
at  the  end. 
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APPENDIX  H 


PRACTICE  VOICE  COMMANDS 


1.  For  E2C  lay  a  barrier  from  36N  76W  bearing  180 
distance  100  using  sonobuoy  passive. 

2.  For  1F14B  bingo. 

3.  For  EA3  proceed  position  27N  183E. 

4.  For  1F14B  speed  1200  course  090  altitude  10000 

5.  Designate  Enterprise  77.1. 
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c.l         "sing  voice  recog- 
nition equipment  to 
run  the  Warfare  En- 
vironmental Simulator 
(WES) . 


